Iterative refinement of lexicon and phrasal alignment
نویسندگان
چکیده
In a data-driven machine translation system, the lexicon is a core component. Sometimes it is used directly in translation, and sometimes in building other resources, such as a phrase table. But up to now little attention has been paid to how the information contained in these resources can also used backwards to help build or improve the lexicon. The system we propose here alternates lexicon building and phrasal alignment. Evaluation on Arabic to English translation showed a statistically significant 1.5 BLEU point improvement.
منابع مشابه
An Expert Lexicon Approach to Identifying English Phrasal Verbs
Phrasal Verbs are an important feature of the English language. Properly identifying them provides the basis for an English parser to decode the related structures. Phrasal verbs have been a challenge to Natural Language Processing (NLP) because they sit at the borderline between lexicon and syntax. Traditional NLP frameworks that separate the lexicon module from the parser make it difficult to...
متن کاملExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
In this work, the use of a phrasal lexicon for statistical machine translation is proposed, and the relation between data acquisition costs and translation quality for different types and sizes of language resources has been analyzed. The language pairs are Spanish-English and Catalan-English, and the translation is performed in all directions. The phrasal lexicon is used to increase as well as...
متن کاملInduction of Root and Pattern Lexicon for Unsupervised Morphological Analysis of Arabic
We propose an unsupervised approach to learning non-concatenative morphology, which we apply to induce a lexicon of Arabic roots and pattern templates. The approach is based on the idea that roots and patterns may be revealed through mutually recursive scoring based on hypothesized pattern and root frequencies. After a further iterative refinement stage, morphological analysis with the induced ...
متن کاملUnsupervised Induction of a Syntax-Semantics Lexicon Using Iterative Refinement
We present a method for learning syntaxsemantics mappings for verbs from unannotated corpora. We learn linkings, i.e., mappings from the syntactic arguments and adjuncts of a verb to its semantic roles. By learning such linkings, we do not need to model individual semantic roles independently of one another, and we can exploit the relation between different mappings for the same verb, or betwee...
متن کاملTone and accent in Saramaccan: Charting a deep split in the phonology of a language
Saramaccan, an Atlantic creole spoken in Surinam, has traditionally been analyzed as exhibiting a high-tone/low-tone opposition in its lexicon. However, while it is true that part of its lexicon exhibits a robust high/low opposition, the majority of its words are marked not for tone but pitch accent. The Saramaccan lexicon, therefore, is split with some words being marked for tone and other wor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007